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Abstract 

We study the qualitative properties of optimal regularisation parameters in variational models for image 
restoration. The parameters are solutions of bilevel optimisation problems with the image restoration 
problem as constraint. A general type of regulariser is considered, which encompasses total variation 
(TV), total generalized variation (TGV) and infimal-convolution total variation (ICTV). We prove that 
under certain conditions on the given data optimal parameters derived by bilevel optimisation problems 
exist. A crucial point in the existence proof turns out to be the boundedness of the optimal parameters 
away from 0 which we prove in this paper. The analysis is done on the original - in image restoration 
typically non-smooth variational problem - as well as on a smoothed approximation set in Hilbert space 
which is the one considered in numerical computations. For the smoothed bilevel problem we also prove 
that it r converges to the original problem as the smoothing vanishes. All analysis is done in function 
spaces rather than on the discretised learning problem. 

1. Introduction 

In this paper we consider the general variational image reconstruction problem that, given parameters 
a = («!,..., Oat), Af > 1, aims to compute an image 

Ua £ argmin J(m; a). 

u&X 

The image depends on a and belongs in our setting to a generic function space X. Here J is a 
generic energy modelling our prior knowledge on the image Ua- The quality of the solution Ua of 
variational imaging approaches like this one crucially relies on a good choice of the parameters a. We 
are particularly interested in the case 

N 

J{u; a) = ^{Ku) + aj ||Aju||j, 
i=i 

with K a generic bounded forward operator, <i> a fidelity function, and Aj linear operators acting 
on u. The values AjU are penalised in the total variation or Radon norm ||/r||j = ||/r||A 4 (n;R’"j)) and 
combined constitute the image regulariser. In this context, a represents the regularisation parameter 
that balances the strength of regularisation against the fitness of the solution to the idealised forward 
model K. The size of this parameter depends on the level of random noise and the properties of the 
forward operator. Choosing it too large results in over-regularisation of the solution and in turn may 
cause the loss of potentially important details in the image; choosing it too small under-regularises the 
solution and may result in a noisy and unstable output. In this work we will discuss and thoroughly 
analyse a bilevel optimisation approach that is able to determine the optimal choice of a in J{]a). 

Recently, bilevel approaches for variational models have gained increasing attention in image pro¬ 
cessing and inverse problems in general. Based on prior knowledge of the problem in terms of a training 
set of image data and corresponding model solutions or knowledge of other model determinants such as 
the noise level, optimal reconstruction models are conceived by minimising a cost functional - called 
F in the sequel - constrained to the variational model in question. We will explain this approach 
in more detail in the next section. Before, let us give an account of the state of the art of bilevel 
optimisation for model learning. In machine learning bilevel optimisation is well established. It is a 
semi-supervised learning method that optimally adapts itself to a given dataset of measurements and 
desirable solutions. In [41, 42, 27, 28, 18, 19], for instance the authors consider bilevel optimization 
for finite dimensional Markov random field (MRF) models. In inverse problems the optimal inversion 
and experimental acquisition setup is discussed in the context of optimal model design in works by 
Haber, Horesh and Tenorio [33, 31, 32], as well as Ghattas et al. [13, 8]. Recently parameter learning 
in the context of functional variational regularisation models also entered the image processing com¬ 
munity with works by the authors [23, 14], Kunisch, Pock and co-workers [36, 17] and Chung et al. 


[20]. A very interesting contribution can be found in a preprint by Fehrenbach et al. [30] where the 
authors determine an optimal regularisation procedure introducing particular knowledge of the noise 
distribution into the learning approach. 

Apart from the work of the authors [23, 14] , all approaches for bilevel learning in image processing 
so far are formulated and optimised in the discrete setting. Our subsequent modelling, analysis and 
optimisation will be carried out in function space rather than on a discretisation of the variational 
model. In this context, a careful analysis of the bilevel problem is of great relevance for its application 
in image processing. In particular, the structure of optimal regularisers is important, among others, 
for the development of solution algorithms. In particular, if the parameters are bounded and lie in 
the interior of a closed connected set, then efficient optimization methods can be used for solving 
the problem. Previous results on optimal parameters for inverse problems with partial differential 
equations have been obtained in, e.g., [16]. 

In this paper we study the qualitative structure of regularization parameters arising as solution 
of bilevel optimisation problem of variational models. In our framework the variational models are 
typically convex but non-smooth and posed in Banach spaces. The total variation and total generalized 
variation regularisation models are particular instances. Alongside the optimisation of the non-smooth 
variational model, we also consider a smoothed approximation in Hilbert space which is typically 
the one considered in numerical computation. Under suitable conditions, we prove that ~ for both 
the original non-smooth optimisation problem as well as the regularised Hilbert space problem - 
the optimal regularisers are bounded and lie in the interior of the positive orthant. The conditions 
necessary to prove this turn out to be very natural conditions on the given data in the case of an 
cost functional F. Indeed, for the total variation regularisers with L^-squared cost and fidelity, we will 
merely require 

TV(/) > TV(/o), 

with /o the ground-truth and / the noisy image. That is, the noisy image should oscillate more 
in terms of the total variation functional, than the ground-truth. For second-order total generalised 
variation [11], we obtain an analogous condition. Apart from the standard costs, we also discuss 
costs that constitute a smoothed norm of the gradient of the original data - we will call this 
the Huberised total variation cost in the sequel - typically resulting in optimal solutions superior to 
the ones minimising an cost. For this case, however, the interior property of optimal parameters 
could be verified for a finite dimensional version of the cost only. Eventually, we also show that as 
the numerical smoothing vanishes the optimal parameters for the smoothed models tend to optimal 
parameters of the original model. 

The results derived in this paper are motivated by problems in image processing. However, their 
applicability goes well beyond that and can be generally applied to parameter estimation problems of 
variational inequalities of the second kind, for instance the parameter estimation problem in Bingham 
flow [22]. Previous analysis in this context either required box constraints on the parameters in order to 
prove existence of solutions or the addition of a parameter penalty to the cost functional [3, 6, 7, 37]. In 
this paper, we require neither but rather prove that under certain conditions on the variational model 
and for reasonable data and cost functional, optimal parameters are indeed positive and guaranteed 
to be bounded away from 0 and oo. As we will see later this is enough for proving existence of 
solutions and continuity of the solution map. The next step from our work in this here is deriving 
numerically useful characterisations of solutions to the ensuing bi-level programs. For the most basic 
problems considered herein this has been done in [37] under numerical regularisation. We will 
consider in the follow-up work [24] the optimality conditions for higher-order regularisers and the new 
cost functionals introduced in this work. For an extension characterisation of optimality systems for 
bi-level optimisation in finite dimensions, we point the reader to [26] as a starting point. 

Outline of the paper In Section 2 we introduce the general bilevel learning problem, stating as¬ 
sumptions on the setup of the cost functional F and the lower level problem given by a variational 
regularisation approach. The bilevel problem is discussed in its original non-smooth form (P) as well 


as in a smoothed form in a Hilbert space setting in Section 2.2, that will be the one used in 

the numerical computations. The bilevel problem is put in context with parameter learning for non¬ 
smooth variational regularisation models, typical in image processing, by proving the validity of the 
assumptions for examples such as TV, TGV and ICTV regularisation. The main results of the paper ~ 
existence of positive optimal parameters for L^, Huberised TV and type costs and the convergence 
of the smoothed numerical problem to the original non-smooth problem - are stated in Section 3. 
Auxiliary results, such as coercivity, lower semicontinuity and compactness results for the involved 
functionals, is the topic of Section 4. Proofs for existence and convergence of optimal parameters are 
contained in Section 5. The paper finishes with a brief numerical discussion in Section 6. 

2. The general problem setup 

Let H C be an open bounded domain with Lipschitz boundary. This will be our image domain. 
Usually H = (0, w) x (0, h) for w and h the width and height of a two-dimensional image, although no 
such assumptions are made in this work. Our noisy or corrupted data / is assumed to lie in a Banach 
space Y, which is the dual of Y^, while our ground-truth /o lies in a general Banach space Z. Usually, 
in our model, we choose Z = Y = This holds for denoising or deblurring, for example, 

where the data is just a corrupted version of the original image. Further d = 1 for grayscale images, 
and d = 3 for colour images in typical colour spaces. For sub-sampled reconstruction from Fourier 
samples, we might use a finite-dimensional space Y = C” - there are however some subtleties with 
that, and we refer the interested reader to [1]. 

As our parameter space for regularisation functional weights we take 

[0,oo]^, 

where N is the dimension of the parameter space, that is a = (ai, ... N > 1. Observe that 

we allow infinite and zero values for a. The reason for the former is that in case of TGV^, it is not 
reasonable to expect that both ai and 02 are bounded; such conditions would imply that TGV^ 
performs better than both TV^ and TV. But we want our learning algorithm to find out whether that 
is the case! Regarding zero values, one of our main tasks is proving that for reasonable data, optimal 
parameters in fact lie in the interior 

intR“ = (0,oo]^. 

This is required for the existence of solutions and the continuity of the solution map parametrised by 
additional regularisation parameters needed for the numerical realisation of the model. We also set 

Va := [ 0 , 00 )"^ 

for some occasions when we need a bounded parameter. 

Remark 2.1. In much of our treatment, we could allow for spatially dependent parameters a. How¬ 
ever, the parameters would need to lie in a finite-dimensional subspace of (70(14; in our theory. 
Minding our general definition of the functional •; a) below, no generality is lost by taking a 
to be vectors in . We could simply replace the sum in the functional as a larger sum modelling 
integration over parameters with values in a finite-dimensional subspace of (7o(14; 


In our general learning problem, we look for a = {ai,..., ajsf) solving for some convex, proper, 
weak* lower semicontinuous cost functional T : X —)• M the problem 


min F(ua) 


(P) 


Ua £ argmin J(m; a), 

u&X 


(Da) 


subject to 


Here we denote for short the total variation norm 


The following covers our assumptions with regard to A, K, and d>. We discuss various specific examples 
in Section 2.1 immediately after stating the assumptions. 

Assumption A-KA (Operators A and K). We assume that Y is Banach spaces, and X a normed 
linear space, both themselves duals of W and A*, respectively. We then assume that the linear operators 

A, : (j = l,...,A), 

and 

K : X 

are bounded. Regarding K, we also assume the existence of a bounded a right-inverse : TZ{K) —)• X, 
where TZ{K) denotes the range of the operator K. That is, KK^ = / on TZ{K). We further assume 
that 

N 

\Mx ■= + \\Ku\\y ( 2 . 1 ) 

is a norm on X, equivalent to the standard norm. In particular, by the Banach-Alaoglu theorem, any 
sequence C X with supj ||«*||x < cc possesses a weakly* convergent subsequence. 

Assumption A-d> (The fidelity <h). We suppose : T —>■ (— 00 , 00 ] is convex, proper, weakly* lower 
semicontinuous, and coercive in the sense that 

‘h(u) —)• -|-oo as llflly —)• -|-oo. (2-2) 

We assume that 0 G dom<I>, and the existence of / G argmin^gy <h(u) such that / = Kf for some 
/ G A. Finally, we require that either K is compact, or <I> is continuous and strongly convex. 

Remark 2.2. When / exists, we can choose / = f. 

Remark 2.3. Instead of 0 G dom<I>, it would suffice to assume, more generally, that domd> n 
njliker Aj / 0. 

We also require the following technical assumption on the relationship of the regularisation terms 
and the fidelity d>. Roughly speaking, in most interesting cases, it says that for each i, we can closely 
approximate the noisy data / with functions of order i. But this is in a lifted sense, not directly in 
terms of derivatives. 

Assumption A-5 (Order reduction). We assume that for every i G {1,..., N} and 5 > 0, there exists 
fs/ G A such that 


W^ifs/Wi < 00 , and. 


(2.3a) 

(2.3b) 


2.1. Specific examples 


We now discuss a few examples to motivate the abstract framework above. 

Example 2.1 (Squared fidelity). With Y = L^(Q), f £ Y, and 

we recover the standard L^-squared fidelity, modelling Gaussian noise. On a bounded domain 
Assumption A-d> follows immediately. 

Example 2.2 (Total variation denoising). Let us take Kq as the embedding of A = BV(0) n L^(0) 
into Z = L^(Q), and Ai = D. We equip X with the norm 

This makes Kq a bounded linear operator. If the domain Q has Lipschitz boundary, and the dimension 
satisfies n G {1,2}, the space BV(II) continuously embeds into L^(n) [2, Corollary 3.49]. Therefore, 
we may identify X with BV(II) as a normed space. Otherwise, if n > 3, without going into the details 
of constructing A as a dual space, ^ we define weak* convergence in A as combined weak* convergence 
in BV(0) and L^(Q). Any bounded sequence in A will then have a weak* convergent subsequence. 
This is the only property we would use from A being a dual space. 

Now, combined with Example 2.1 and the choice K = Kq, Y = Z, we get total variation denoising 
for the sub-problem (Dq). Assumption A-KA holding is immediate from the previous discussion. 
Assumption A-6 is also easily satisfied, as with / £ BV(ll) H L^(ll), we may simply pick fs^i = / 
for the only possible case i = 1. Observe however that K is not compact unless n = 1, see [2, 
Corollary 3.49], so the strong convexity of ‘h is crucial here. If the data / G then it is well- 

known that solutions u to (Dq) satisfy llilllL°°(Q) < ||/||Lo°(r 2 )- We could therefore construct a compact 
embedding by adding some artificial constraints to the data /. This changes in the next two examples, 
as boundedness of solutions for higher-order regularisers is unknown; see also [43] . 

Example 2.3 (Second order total generalised variation denoising). We take 

A = (BV(ll) n ^2(0)) X BD(ll), 

the first part with the same topology as in Example 2.2. We also take Z = L^(ll), denote u = {v,w), 
and set 

Kq{v, w) = V, Aiu = Dv — w, and A 2 U = Ew 

for E the symmetrised differential. With K = Kq and Y = Z, this yields second-order total generalised 
variation (TGV^) denoising [11] for the sub-problem (Dq). Assuming for simplicity that ai,a 2 > 0 
are constants, to show Assumption A-KA, we recall from [12] the existence of a constant c = c(ll) 
such that 

C~^llullBV(n) < IkllLl(f^) + ^ cIIuIIbv(O); 

where the norm 

IkllBV(n) := TGV^^_;^)(u) -h ll'yll2,i(Q). 

We may thus approximate 

ll^llLl(n;]R") < 1!-^^ “ A^(0;R") 

< \\DV — 'fCllA^(n;IRn) -I- c(TGV^^_;^)(u) -I- 11^112,1(0)), 

< (1 -|- c){\\Dv — Uljj_A4(0;R") + l|-®'f^llAt(r 2 ;R"X")) -|- cjjUjj£, 1 ( 0 ) . 


^This can be achieved by allowing do G K{0,) instead of the Co{A) in the construction of [2, Remark 3.12]. 



For some C > 0, it follows 


ll'^llx — (||^||L2(f^) + ||-Dr'||A^(0;Rn)) + (||w;||Li(n;R’i) + l|-®^llx(n;R"X")) 

< C (ll-Df — 'fil||_A4(r2;R") + l|-£'^llx(n;R"xn) + ||f ||L2(f7)^ 

/ N 

= C* ( ^ + ll-^^llL2(n) 

\i=l 

This shows \\u\\x < Cll'^^llx- inequality ||ri||x > ll'*^llx follows easily from the triangle inequality, 
namely 

Ikllx > ||^^||l 2(Q) + ||T>r; - W?|U(n;Rn) + ||^W?|U(n;RnXn). 

Thus II • 11^ is equivalent to || • ||x- 

Next we observe that clearly Dv^ — ^ Dv — tc in A^(f7; M”) and Ew^ Ev in At(O; if 

^ V in BV(n) and ^ w weakly* in BD(r2). Thus Ai and A 2 are weak* continuous. Assumption 
A-KA follows. 

Consider then the satisfaction of Assumption A-5. If = 1, we may then pick fs^i = (/, 0), in which 
case Aifs^i = Df, and ^ 2 / 5,1 = 0. If £ = 2, which is the only other case, we may pick a smooth 
approximate fs ,2 to / such that 

2 11 / “ /5,2|li2(Q) < 5- 

Then we set Js ^2 '■= (/< 5 , 2 , V/ 5 , 2 ), yielding Aifs ^2 = 0 and ^ 2 / 5,2 = V^/ 5 , 2 - By fs ^2 being smooth we see 
that ||A 2 / 5 ^ 2 ||x < oo- Thus Assumption A-6 is satisfied for the squared fidelity ‘h(u) = ||/~'f^||| 2 (Q)- 

Example 2.4 (Infimal convolution TV denoising). Let us take Z = L^(n), and X = (BV(n) n 
L‘^{p?)) X A 2 , where 

V 2 := {v e I Vu e BV(Q;M”)}, 

and BV(II) n L‘^{Q) again has the topology of Example 2.2. Setting u = {v,w), as well as 

Ko(v,w)=v + w, Aiu = Dv, and A 2 U = DVw, 

we obtain for (Dq) with K = Kq and Y = Z the infimal convolution total variation denoising model 
of [15]. Assumption A-KA, A-6, and A-H are verified analogously to TGV^ in Example 2.3. 

Example 2.5 (Cost functionals). For the cost functional E, given noise-free data foGZ = L^(n), 
we consider in particular the cost 

Pqiu) := ^||/o-i^ow|li2(o), 
as well as the Huberised total variation cost 

Fiiv(n) := \\D{fo-Kou)\\, 

with noise-free data /o G V := BV(n). For the definition of the Huberised total variation, we refer to 
the Section 2.2 on the numerics of the bi-level framework (P). 

Example 2.6 (Sub-sampled Fourier transforms for MRI). Let Kq and the AjS be given by one of the 
regularisers of Example 2.2 to 2.4. Also take the cost F = or as in Example 2.5, and ‘h as 
the squared fidelity of Example 2.1. However, let us now take K = TKq for some bounded linear 
operator T : Z ^ Y. The operator T could be, for example, a blurring kernel or a (sub-sampled) Fourier 
transform, in which case we obtain a model for learning the parameters for deblurring or recontruction 
from Fourier samples. The latter would be important, for example for magnetic resonance imaging 
(MRI) [5, 44, 45]. Unfortunately, our theory does no extend to many of these cases because we will 
require, roughly, KqKq < CK*K for some constant C > 0. 



Example 2.7 (Parameter estimation in Bingham flow). Bingham fluids are materials that behave as 
solids if the magnitude of the stress tensor stays below a plasticity threshold, and as liquids if that 
quantity surpasses the threshold. In a cross sectional pipe, of section Q, the behaviour is modeled by 
the energy minimization functional 




(2.4) 


where /r > 0 stands for the viscosity coefficient, a > 0 for the plasticity threshold and / G In 

many practical situations, the plasticity threshold is not known in advance and has to be estimated 
from experimental measurements. One then aims at minimizing a least squares term 

= 2 11 '“ “ fo\\h(n) 


subject to (2.4). 

The bilevel optimization problem can then be formulated as problem (P)-(Dq,), with the choices 
X = Y = K the identity, Ai = D and 

Concentrating in the rest of this paper primarily on image processing applications, we will however 
briefly return to Bingham flow in Example 5.1. 


2.2. Considerations for numerical implementation 

For the numerical solution of the denoising sub-problem, we will in a follow-up work [24] expand 
upon the infeasible semi-smooth quasi-Newton approach taken in [34] for L^-TV image restoration 
problems. This depends on Huber-regularisation of the total variation measures, as well as enforcing 
smoothness through Hilbert spaces. This is usually done by a squared penalty on the gradient, i.e., 
regularisation, but we formalise this more abstractly in order to simplify our notation and arguments 
later on. Therefore, we take a convex, proper, and weak* lower-semicontinous smoothing functional 
H : X —)■ [0, oo], and generally expect it to satisfy the following. 

Assumption A-H (Smoothing). We assume that 0 G domH and for every 5 > 0, every a G 
and every u € X, the existence of G A satisfying 

H{u^)<oo and J{u^]a)<J{u]a) + 6 . (2-5) 

Example 2.8 smoothing in BV(H)). Usually, with n A / 0, we take 

1 oo, otherwise. 

This is in particular the case with Example 2.2 (TV), where A = BV(H) n L^(H) D H^(H), and 
Example 2.3 (TGV^), where A = (BV(H) n L^(H)) x BD(H) D H^(H) x on a bounded 

domain H. In both of these cases, weak* lower semicontinuity is apparent; for completeness we record 
this in Lemma 4.1 in Section 4.2. In case of Example 2.2, (2.5) is immediate from approximating 
u strictly by functions in C'°°(H) using standard strict approximation results in BV(H) [2]. In case 
of Example 2.3, this also follows by a simple generalisation of the same argument to TGV-strict 
approximation, as presented in [43, 10]. 


For parameters e > 0 and 7 £ (0, 00 ], we then consider the problem 

min 

where Ua,'y,e £ ^ n dom eH solves 

min a) 

u£X 

for 

N 

a) := eH{u) + ^{Ku) + ^ aj 

i=i 

Here we denote for short the Huber-regularised total variation norm 

as given by the following definition. There we interpret 7 = 00 to give back the standard unregularised 
total variation measure. Clearly (D„) corresponds to and (P) to (P'*'’'^) with 

( 7 ,e) = ( 00 , 0 ). 


Definition 2.1. Given 7 £ (0,oo], we define for the norm || • II 2 on the Huber regularisation 


Is-k = 



Wah > 1 / 7 , 
Wah < 1 / 7 - 


We observe that this can equivalently be written using convex conjugates as 


a\a\-y = sup 





( 2 . 6 ) 


Then if /i = /£"■ -|- is the Lebesgue decomposition of /i £ M™’) into the absolutely continuous 

part /£” and the singular part we set 

\aU{V):=[ \f{x)\^dx + \^^^\{V), (P£H(H)). 

J 


The measures is the Huber-regularisation of the total variation measures |^|, and we define its 
Radon norm as the Huber regularisation of the Radon norm of fi, that is 


ll/^ll7,A^(0;R’") llll‘l7llA^{fJ;R’")- 

Remark 2.4. The parameter 7 varies in the literature. In this paper, we use the convention of [37], 
where large 7 means small Huber regularisation. In [34], small 7 means small Huber regularisation. 
That is, their regularisation parameter is I /7 in our notation. 


2.3. Shorthand notation 


Writing for notational lightness 


Au := {Aiu ,..., A]\fu), 


and 


N 

F (1^1 ? • ■ ■; ^ Waj ll7,T R(*,(r) . R (*,0), 

j=i 


our problem (P'^’’^) becomes 

min F{ua) subject to £ argmin a) 

a&'P^ tigX 


for 


a) := eH{u) + ^{Ku) + W{Au] a). 




Further, given a. G Va^ we define the “marginalised” regularisation functional 


T^(u;a):= inf W{Av]a), r( •; d) := r“( •; d). (2.7) 

v&K-^v 

Here K~^ stands for the preimage, so the constraint is 7 G X with v = Kv. Then in the case 
( 7 , e) = (( 00 ,0) and F = Fq o K, our problem may also be written as 

min Fo{va) 


subject to 


Va G argmin <l>(u) + T{v, a). 

v& 


This gives the problem a much more conventional flair, as the following examples demonstrate. 


Example 2.9 (TV as a marginal). Consider the total variation regularisation of Example 2.2. Then 
Hi = ZD, V = 1, and 

T{v] a) = R{Av; a) = aTV(u). 

Example 2.10 (TGV^ as a marginal). In case of the TGV^ regularisation of Example 2.3, we have 
K{v,w) = V and K^v = (u,0). Thus KK^f = /, etc., so 

T(v,a)= inf R(Dv — w,Ew,a) = TGY‘i(v). 
w£BD{n) 


3. Main results 

Our task now is to study the characteristics of optimal solutions, and their existence. Our results, based 
on natural assumptions on the data and the original problem (P), derive properties of the solutions to 
this problem and all numerically regularised problems (P'^’'^) sufficiently close to the original problem: 
large 7 > 0 and small e > 0. We denote by Ua,y,e any solution to (D'*'’'^) for any given a G Va^ and 
by any solution to (P'>'’'^). Solutions to (D^) and (P) we denote, respectively, by = ?ra,oo, 0 ) and 

^ — ^ 00 , 0 * 

3.1. L^-squared cost and L^-squared fidelity 

Our main existence result regarding L^-squared costs and L^-squared fidelities is the following. 

Theorem 3.1. Let Y and Z be Hilbert spaees, F{u) = ^\\Kqu — /o||^, and <h(u) = |||/ — v\\y for 
some f G TZ{K), fo G Z, and a bounded linear operator Kq : X ^ Z satisfying 

II.K’o^llz < ) for all u ^ X for some eonstant Cq > 0. (3-1) 

Suppose Assumption A-KA and A-6 hold. If for some a G int'Po, and t G (0, I/Cq] holds 

T{f; a) > T{f - t{KoK^nKoK^f - /o); d), (3.2) 

then problem (P) admits a solution a G int'Po. 

If, moreover, A-H holds, then there exist 7 G (0, 00 ) and e G (0, 00 ) sueh that the problem (P'^’'^) 
with ( 7 , e) G [ 7 , 00 ] X [0, e] admits a solution G intP“, and the solution map 

5 ( 7 , e) := argmin F{ua,^,e) 
is outer semieontinuous within [ 7 , 00 ] x [ 0 ,e]. 


We prove this result in Section 5. Outer semicontinuity of a set-valued map 5 : ^ means [39] 

that for any convergent sequence —)■ x and S{x^) 3 ^ y, we have y G S{x). In particular, the 

outer semicontinuity of S means that as the numerical regularisation vanishes, the optimal parameters 
for the regularised models tend to optimal parameters of the original model (P). 

Remark 3.1. Let Z = Y and Kq = K \n Theorem 3.1. Then (3.2) reduces to 

r(/;d) >r(/o;d). (3.3) 

Also observe that our result requires, ^ o K to measure all the data that F measures, in the more 
precise sense given by (3.1). If (3.1) did not hold, an oscillating solution Ua for a G dV^, could largely 
pass through the nullspace of K, hence have low value for the objective J of the inner problem, yet 
have a large cost given by F. 

Corollary 3.1 (Total variation Gaussian denoising). Suppose /,/o G BV(n) nL^(n), and 

TV(/) > TV(/o). (3.4) 

Then there exist e, 7 > 0 such that any optimal solution to the problem 

G argmin Jll/o - 

a>0 ^ 

with ^ 

G argmin(-1|/ - u||i 2 (m -h a\\Du\\^^M + ««)) 

ueBV(0)^4 ^ ^ 2 ^ ’ V 

satisfies > 0 whenever e G [0,e], 7 G [ 7 , 00 ]. 


That is, for the optimal parameter to be strictly positive, the noisy image / should, in terms of 
the total variation, oscillate more than the noise-free image /q. This is a very natural condition: if 
the noise somehow had smoothed out features from /o, then we should not smooth it anymore by TV 
regularisation! 

Proof. Assumption A-KA, A-5, and A-iL we have already verified in Example 2.2. We then observe 
that Kq = K, so we are in the setting of Remark 3.1. Following the mapping of the TV problem to the 
general framework using the construction in Example 2.2, we have K = I and = I embeddings with 
Y = L^(n). is bounded on Tl{K) = L^(ll) nBV(n). Moreover, by Example 2.9, T{v; d) = dTV(u). 
Thus (3.3) with the choice t = I reduces to (3.4). □ 

For TGV^ we also have a very natural condition. 

Corollary 3.2 (Second-order total generalised variation Gaussian denoising). Suppose that the data 
/, /o G L‘^{PI) n BV(ll) satisfies for some 02 > 0 the condition 

TGV^a„i)(/)>TGV?a,.i)(/o)- (3.5) 

Then there exists e, 7 > 0 such that any optimal solution = ((a 7 ,e)i, (a 7 ,e) 2 ) to the problem 

G arg min ^11/o - '(^a, 7 ,Ji 2 (m 

with 

{va,'r,e,Wa,'y,e) G arg min ( -1| / - u jjlam) + ai\\Dv - w\\^^m + Oi 2 \\Ew\\^^M 

veBY{n) 

satisfies > 0 whenever e G [ 0 ,e], 7 G [7,00]. 


Proof. Assumption A-KA, A-5, and A-H we have already verified in Example 2.3. We then observe 
that Kq = K, so we are in the setting of Remark 3.1. By Example 2.10, T{v; a) = TGV|(u). Finally, 
similarly to (3.4), we get for (3.3) with t = 1 the condition (3.5). □ 

Example 3.1 (Fourier reconstructions). Let Kq be given, for example as constructed in Example 2.2 
or Example 2.3. If we take K = J-Kq for T the Fourier transform - or any other unitary transform - 
then (3.1) is satisfied and . Thus (3.2) becomes 

r(/; d) > T{f - t{KoKlmKoKl,F*f - /o); d). 

With F* f, fo G TZ{Kq) and t = 1 this just reduces to 

T{f;a)>T{Ffo;a). 


Unfortunately, our results do not cover parameter learning for reconstruction from partial Fourier 
samples exactly because of (3.1). What we can do is to find the optimal parameters if we only know 
a part of the ground-truth, but have full noisy data. 


3.2. Huberised total variation and other L^-type costs with L^-squared fidelity 

We now consider the alternative “Huberised total variation” cost functional from 2.5. Unfortunately, 
we are unable to derive for easily interpretable conditions as for the E^ 2 . If we discretise the 

definition in the following sense, then we however get natural conditions. So, we let /o G Z, assuming 
Z is a reflexive Banach space, and pick rj G (0, oo]. We define 

FpSz)-= sup (A|z -/o) - ;^||A|||*. 

If 

= {6, • • •, ^m} C U G ^“(H; M-) I IICII < 1}, 
is finite-dimensional, we define 

V := {(-div^ilu),..., (-div^M^)}, {v G BV(H)). 

We may now approximate by 

° F)^ where /o := fo and Z = with oo-norm. 

We return to this approximation after the following general results on Fi^i. 

Theorem 3.2. Let Y be a Hilbert space, and Z a reflexive Banach space. Let F = o Kq, and 

^{v) = 2 II/ “ ^lly some compact linear operator Kq : A —>■ Z satisfying (3.1) and f G 1Z{K). 
Suppose Assumption A-KA and A-5 hold. If for some a G inUPo, and t > 0 holds 

T{f;a)>T{f-t{KoK^rX;a), \ G dF^iiKoK^f), (3.6) 

then the problem (P) admits a solution a G inUP^. 

If, moreover, A-H holds, then there exists there exist 7 G (0, 00 ) and e G (0, 00 ) such that the 
problem (P'>'’’^) with ( 7 , e) G [ 7 , 00 ] x [0,e] admits a solution G intP^, and the solution map S is 
outer semicontinuous within [ 7 , 00 ] x [ 0 ,e]. 


We prove this result in Section 5. 


Remark 3.2. If Kq = K, the condition (3.6) has the much more legible form 


r(/;d) >T(/-tA;«), A E (3.7) 

Also if K is compact, then the compactness of Kq follows from (3.1). In the following applications, K 
is however not compact for typical domains Q C or so we have to make Kq compact by making 
the range finite-dimensional. 

Corollary 3.3 (Total variation Gaussian denoising with discretised Huber-TV cost). Suppose that 
the data satisfies /, /o E BV(n) n and for some t > 0 and ^ E lA the condition 

TV(/) > TV(/ + fdivO, -div^ E dFl,^{f). (3.8) 

Then there exists e, 7 > 0 such any optimal solution to the problem 

m“F2;v(/o-^;a) 

with ^ 

Ue, E argmin(-||/ - u\\\ 2 m) + a\\Du\\^^M + 

■ueBV(0)^A ^ ^ I V . 

satisfies > 0 whenever e E [ 0 ,e], 7 E [7,00]. 


This says that for the optimal parameter to be strictly positive, the noisy image / should oscillate 
more than the image f + t div ^ in the direction of the (discrete) total variation How. This is a very 
natural condition, and we observe that the non-discretised counterpart of (3.8) for 7 = 00 would be 

TV(/) > TV(/ + f div 0, ? e Sgn(Z)/o - Df ), 

where we define for a measure /i E Ad(0; M"*) the sign 

Sgn(/r) := E L^(0; /r) | /i = ^|/r|}. 

That is, — div^ is the total variation flow. 


Proof. Analogous to Corollary 3.1 regarding the cost. 


□ 


For TGV^ we also have an analogous natural condition. 

Corollary 3.4 (TGV^ Gaussian denoising with discretised Huber-TV cost). Suppose that the data 
/, /o E L^(H) satisfies for some t,a 2 > 0 and A E V the condition 


TGV^a..i)(/) > TGV^a..i)(/ + idivA), -divA E dFl^{f). 

Then there exists e, 7 > 0 such any optimal solution = ((a..),^e)i, (07,6)2) to the problem 

min^Liv(/o-^a) 

Q :>0 ^ 


with 


veBV{n) 


(va,Wa) E argmin(-||/-u||^2(s7) + ai\\Dv - w\\^^m + Oi 2 \\Ew\\y^M 


+ H|(Vu,Vu;)||i2(^^j 


'n^mnxn\ 


satisfies (07,6)1,(07,6)2 > 0 whenever e E [ 0 ,e], 7 E [7,00]. 


(3.9) 


Proof. Analogous to Corollary 3.2. 


□ 


4. A few auxiliary results 


We record in this section some general results that will be useful in the proofs of the main results. 
These include the coercivity of the functional •; A, a), recorded in Section 4.1. We then discuss 
some elementary lower semicontinuity facts in Section 4.2. We provide in Section 4.3 some new results 
for passing from strict convergence to strong convergence 

4.1. Coercivity 


Observe that 


=sup 



1 

27 


^illi2(r2;i 


V'j G X M’*), 

supa;en IIV’i(a;)ll2 < 1 


Thus 


for some C 
e > 0 holds 


C 

a) > a) > Rill] a) - C^iQ) 

27 

C'{a). Since 14 is bounded, it follows that given 5 > 0 , for large enough 7 > 0 and every 
J{u] a) — 6 < a) < a) < a). (4.1) 


We will use these properties frequently. Based on the coercivity and norm equivalence properties in 
Assumption A-KA and Assumption A-<1>, the following proposition states the important fact that 
is coercive with respect to || • \\x and thus also the standard norm of X. 


Proposition 4.1. Suppose Assumption A-KA and Assumption A-<1> hold, and that a. G intP“. Let 
e > 0 and 7 G (0,oo]. Then 

a) —)■+00 as ||tt||x —t+ 00 . (4.2) 


Proof. Let C X, and suppose supj a) < 00 . Then in particular supj4>(Aru*) < 00 . By 

Assumption A-4> then supj ||iLu*||y < 00 . But Assumption A-KA says 

N 

\W\\x = E + WKu^Wy < + ||A:u*||y. 

j=i 

This implies supj ||u*||( 5 f < 00 . By the equivalence of norms in Assumption A-KA, we immediately 
obtain (4.2). □ 


4.2. Lower semicontinuity 

We record the following elementary lower semicontinuity facts that we have already used to justify 
our examples. 

Lemma 4.1. The following funetionals are lower semieontinuous. 

(i) iie^ \\u — li\\-^^M with respeet to weak* convergenee in At(14;M'^). 

(a) u !-)■ 11/ — with respeet to weak eonvergenee in LP(14;]R'^) for any 1 < p < 2 on a 

bounded domain 14. 

(in) V \\V{f-v)\\l 2 ^^.^axn) with respeet to strong eonvergenee in L^(14;]R'^). 



Proof. In each case, let converge to v. Denoting by G the involved functional, we write it 

as a convex conjugate, G{v) = sup{(u,(^) — G*((^)}. Taking a supremising sequence for this 

functional at any point u, we easily see lower semicontinuity by considering the sequences {(u*, (/?-^) — 
G*{(p^)}'^i for each j. In case (ii) we use the fact that e C when D is 

bounded. 


In case (i), how exactly we write G{fi) = ||z^ — as a convex conjugate demands explanation. 

We first of all recall that for g G M”, the Huber-regularised norm may be written in dual form as 


I5I7 = sup{(g,ff) - ^\\q\\l 

Therefore, we find that 

G(/i) =sup|//(v7) -^|||(^(x)|||dx 
This has the required form. 


12 ^ 


< 1 


}■ 


ip G ||(^(x )||2 < Oi for every x G H 


□ 


We also show here that the marginal regularisation functional T'^ (•; d) is weakly* lower semicon- 
tinuous on Y. Choosing K as in Example 2.2 and Example 2.3, this provides in particular a proof that 
TV and TGV^ are lower semicontinuous with respect to weak convergence in L^(H) when n = 1,2. 

Lemma 4.2. Suppose a G int7^“, and Assumption A-KA holds. Then is lower semieon- 

tinuous with respeet to weak* eonvergenee in Y, and eontinuous with respeet to strong eonvergenee in 

n{K). 

Proof. Let ^ v weakly* in Y. By the Banach-Steinhaus theorem, the sequence is bounded in Y. 
From the definition 

T'^{v;a) := _ inf W{Av]a). 

v^K~^v 

Therefore, if we pick e > 0 and G K~^v^ such that 

W{Av^-a) < r^(u^d) + e, 

then referral to Assumption A-KA, yields for some constant c > 0 the bound 

c||h^||x < \\v'^\\+W{Av^]a) < ||u^|| + d) + e. 

Without loss of generality, we may assume that 

liminf d) < 00 , 

k^oo 

because otherwise there is nothing to prove. Then is bounded in A, and therefore admits 

a weakly* convergent subsequence. Let v be the limit of this, unrelabelled, sequence. Since K is 
continuous, we find that Kv = v. But W{‘]a) is clearly weak* lower semicontinuous in A; see 
Lemma 4.1. Thus 


T'^{v;a) < H? {Av]a) < lim inf a) < liminf T'^(u^; a)-|-e. 

k^oo k —^00 

Since e > 0 was arbitrary, this proves weak* lower semicontinuity. 

The see continuity with respect to strong convergence in TZ{K), we observe that if u = Ku G TZ{K), 
then by the boundedness of the operators {Aj}^^^ we get 

T^{v; d) < W{u] d) < R{u] d) < C'lluH, 


for some constant C > 0. So we know that T'>'( •; q:)|7^(A) is finite-valued and convex. Therefore it is 
continuous [29, Lemma 1.2.1]. □ 




4.3. From <I>-strict to strong convergence 


In Proposition 5.2, forming part of the proof of our main theorems, we will need to pass from “<l>-strict 
convergence” of Ku^ to v to strong convergence, using the following lemmas. The former means that 
^{Ku^) —7- and Ku^ v weakly* in Y. By strong convexity in a Banach space T, we mean the 
existence of 7 > 0 such that for every y G Y and z G d^{y) C Y* holds 

Hy') - > {zW -y) + ^\W - vWy, iv' e 5"), 

where {z\y) denotes the dual product, and the subdifferential d^{y) is defined by z satisfying the same 
expression with 7 = 0 . With regard to more advanced strict convergence results, we point the reader 
to [25, 38, 35]. 

Lemma 4.3. Suppose Y is a Banach space, and : T —)• (— 00 , 00 ] strongly convex. If ^ v G 
dom9$ weakly* in Y and ^{v^) —)• then —)• h strongly in Y. 

Remark 4.1. By standard convex analysis [29], v G domfl$ if <1> has a finite-valued point of continuity 
and V G int dom <1>. 

Proof. We first of all note that —00 < ‘k({i) < 00 because v G dom implies v G dom4> 

z G (?$({)). From the strong convexity of for some 7 > 0 then 

$(/) _ _ £)) + l\\yk _ 

Taking the limit infimum, we observe 

0 = — ^{v)d > liminf — \\v^ — v\\y. 

fc —>00 2 

This proves strong convergence. 

We now use the lemma to show strong convergence of minimising sequences. 

Lemma 4.4. Suppose ‘k is strongly convex, satisfies Assumption A-^, and that C <ZY is non-empty, 
closed, and convex with int C n dom / 0. Let 

V := argmin <I>(u). 
v&C 

If C Y with limfc_,.oo = <k({)), then ^ v strongly in Y. 

Proof. By the strict convexity of ‘k, implied by strong convexity, and the assumptions on C, v is 
unique and well-defined. Moreover v G dom 9$. Indeed, our assumptions show the existence of a point 

V G int (7 n dom<I>. The indicator function 5c is then continuous at v, and so standard subdifferential 
calculus (see, e.g., [29, Proposition 1.5.6]) implies that 9(‘k -|- 5c){v) = 5<I>('0) -|- dSc{v). But 0 G 
9(‘k -|- 5c){v) because v G argmin„gy <I>(u) -|- 5c(v). This implies that d^(v) ^ 0. Consequently also 

V G dom<k, and <I>(i)) G M 

Using the coercivity of in Assumption A-4> we then find that is bounded in Y, at least 

after moving to an unrelabelled tail of the sequence with ^{v^) < <I>(1;) -|- 1. Since T is a dual space, 
the unit ball is weak* compact, and we deduce the existence of a subsequence, unrelabelled, and v G Y 
such that ^ V weakly* in Y. By the weak* lower semicontinuity (Assumption A-4>), we deduce 

<I>(u) < liminf ^{v^) = <k({)). 

k^oo 

Since each G C, and C is closed, also v G C. Therefore, by the strict convexity and the definition 
of V, necessarily v = v. Therefore ^ v weakly* in Y, and <I>(u^) —?■ <k({)). Lemma 4.3 now shows 
that —7- h strongly in Y. □ 


. Let us pick 


□ 


5. Proofs of the main results 


We now prove the existence, continuity, and non-degeneracy (interior solution) results of Section 3 
through a series of lemmas and propositions, starting from general ones that are then specialised to 
provide the natural conditions presented in Section 3. 

5.1. Existence and lower semicontinuity under lower bounds 

Our principal tool for proving existence is the following proposition. We will in the rest of this section 
concentrate on proving the existence of the set JC in the statement. We base this on the natural 
conditions of Section 3. 

Proposition 5.1 (Existence on compact parameter domain). Suppose Assumption A-KA and A-^ 
hold. With e > 0 and 7 e ( 0 , 00] fixed, if there exists a eompact set 1C C intP“ with 

inf F(Ua'ye)> inf F(Ua'ye), (5.1) 

then there exists a solution £ int to (P'*'’*') . Moreover, the mapping 


is lower semicontinuous within intP^. 


The proof depends on the following two lemmas that will be useful later on as well. 

Lemma 5.1 (Lower semicontinuity of the fidelity with varying parameters). Suppose Assumption 
A-KA, A-^, and A-H hold. Let ^ u weakly* in X, and —)■ ( 0 , 7 , 6 ) £ intP“ x (0,oo] x 

(0,oo). Then 

a) < liminf ] a^). 

k^oo 


Proof. Let e > 0 be such that aj > e, {j = 1,..., N). We then deduce for large k and some C = (7(0, 7 ) 
that 

limsup e,..., e) < limsup e,... ,e) + C 

^ . (5.2) 

< lim sup J'^ a^) -\-C < 00 . 

k 

Here we have assumed the final inequality to hold. This comes without loss of generality, because 
otherwise there is nothing to prove. Observe that this holds even if is not bounded. In 

particular, if e > 0, restricting k to be large, we may assume that 

Cl := < 00 . (5.3) 


We recall that 


N 

a) := eH(u) -|- ^{Ku) -|- ^ Oj 

1=1 


( 5 . 4 ) 


We want to show lower semicontinuity of each of the terms in turn. We start with the smoothing term. 
If e > 0, using (5.3), we write 


e’^Hfu!^) 


(e*^ - e)H{u^) + eH{u^) < (e^ 


e)^ + 6F(u^). 


e, and the weak* lower semicontinuity of H, we find that 


By the convergence 


If e = 0, we have 
while still 


Thus (5.5) follows. 


eH{u) < liminf 

k^oo 


eH{u) = 0 • oo = 0, 

0 < sup < oo. 

k 


(5.5) 


The fidelity term <i> o is weak* lower semicontinuous by the continuity of K and the weak* lower 
semicontuity of It therefore remains to consider the terms in (5.4) involving both the regularisation 
parameters a, as well as the Huberisation parameter 7 . Indeed using the dual formulation (2.6) of the 
Huberised norm, we have for some constant C' = that 

|| 74 oM^|| fc = sup [ ipdAjU^ — dx 

\\^pix)\\<iJ 27 

> sup f <fdAju’" + ^\\(pfdx-C'\'y~^ - 

II (^(x) II < 1 -In 27 

= P,-n"||^,,--C'|7-'-(7'=)-'l- 

Thus, if a G Pa, we get 

ajWAjU^W^kj = ajWAju’^W^kj + (a^ - aj)\\AjU^\\^kj 

rr j 11 u 11 j ||.y^ j 

> ajWAju’^W^j — — ( 7 ^)“^| — \a^j — aj\\\AjU^\\^kj. 

It follows from (5.2) that the sequence {\\AjU^\\^k is bounded in A4(II; M™'j) for each j = 1,... ,N. 
Thus 

liminfa^llAjU^II fc ■ > liminf > aJI^julLj, 

k^oo ■' ^ k—^oo ’ ’ 

where the final step follows from Lemma 4.1. 


It remains to consider the case that a G P“ \ Va, i.e., when = 00 for some i. We may pick 
sequences {fdj}’^i, {j = l,...,Ai'), such that /ij aj. Further, we may find such that 

fdj’^ < with Oj = lim^^oo f^j'^ and /ij = lim^^oo Pj’^- Then, by the bounded case studied above 


lim inf a^llAjU^ 
k^oo ^ 


> liminf 
' k^oo ■' 


> P^j\\Aju\\^j. 


But {ctj\\AjU^\\^k is bounded by (5.2), and 


lim inf (d^AlAjU 
l-^oo ^ 


7J — 


7J- 


Thus lower semicontinuity follows. 


□ 


Lemma 5.2 (Convergence of reconstructions away from boundary). Suppose Assumption A-KA, A- 
4>, and A-H hold. Let (a^, 7 ^,e^) —)■ (a, 7 ,e) in intP“ x (0,oo] x [0,oo). Then we can find Ua,'Y,e £ 
argmin J'^’^( •; a) and extract a subsequence satisfying 

1 ^ ^ (’^ 0 , 7 , 0 ®)) 

u^k^^k^^k ^ Ua,'y,e Weakly* X, and 
Ku^k .yk ,:k —)■ Kua,'y,e strongly in Y. 


(5.6a) 

(5.6b) 

(5.6c) 


Proof. By Lemma 5.1, we have 


liminf J'’' {u^k = min a). 

k—^oo ’ ’ ’ ’ u&X 


We also want the opposite inequality 


limsupJ'’' {u^k^.yk^^k]a’^) < a). 


k^OQ 


(5.7) 


Let 5 > 0. If e = 0, we use Assumption A-H onu = Ua,'y,£, to produce . Otherwise, we set 
In either case 

< J'^''^{Ua,y,e',C() + (^. 

In particular AjU^ = 0 if aj = oo. Then for large enough k we obtain 


< + 6 
< J'^''^{Ua,y,e', a) + 26. 


(5.8) 


Since <5 > 0 was arbitrary, this proves (5.7) and consequently (5.6a), that is 

lim {u^k .yk a^) = min a) = J{ua y e', ct) < a) = €H{0) + ^{0) < oo. (5.9) 

k—^oo ’ ' ’ ” 


uex 


Minding Proposition 4.1, this allows us to extract a subsequence of {u^^^k^k}’^^, unrelabelled, and 
convergent weakly* in X to some 

u G arg min J{u-,a). 


uex 


We may choose Ua,y,e ■= u. This shows (5.6b). 


If K is compact, we may further assume that Ku^k^^k^^k —)• Kua,y,e strongly in Y, showing (5.6c). 
If K is not compact, ‘h is continuous and strongly convex by Assumption A-4>, and we still have 
^{Ku^k .yk ,,k) -G ^{Kua,y,£). The assumptions of Lemma 4.3 are therefore satisfied. This shows (5.6c). 

□ 


From the lower semicontinuity Lemma 5.1, we immediately obtain the following standard result. 

Theorem 5.1 (Existence of solutions to the reconstruction sub-problem). Let H C M” be a bounded 
open domain. Suppose Assumption A-KA and A-<I> hold, and that a G intP“, e > 0, and 7 e (0,oo]. 
Then (D'^’'^) admits a minimiser Ua,y,e G X r\ AovneH. 

Proof. By Lemma 5.1, fixing (a^, 7 ^,e^) = ( 0 , 7 , e), the functional is lower semicontinuous 

with respect to weak* convergence in X. So we just have to establish a weak* convergent minimising 
sequence. Towards this end, we let C domeLI be a minimising sequence for (D'*'’'^). We may 

assume without loss of generality that sup;;. a) < 00 . By Proposition 4.1 and the inequality 

J 7.0 < j 7 >c^ deduce sup^ ll'^^^llx < 00 . After possibly switching to a subsequence, unrelabelled, we 
may therefore assume weakly* convergent in X to some u G X. this proves the claim. □ 

Proof of Proposition 5.1. Let us take a sequence convergent to a E infP^. Application of 

Lemma 5.2 with 7 ^ = 7 and = e and the weak* lower semicontinuity of F immediately show the 
lower semicontinuity of ly^e within intP^. 

Finally, if is a minimising sequence for (P"*'’'^), by assumption we may take it to lie in /C. 

By the compactness of /C, we may assume the sequence convergent to some a G X. By the lower 
semicontinuity established above, u = Ua,y,£ is a solution to (P"^’^). □ 


5.2. Towards F-convergence and continuity of the solution map 


The next lemma, immediate from the previous one, will form the first part of the proof of continuity 
of the solution map. As its condition, we introduce a stronger form of (5.1) that is uniform over a 
range of e and 7 . 

Lemma 5.3 (T-lower limit of the cost map in terms of regularisation). Suppose Assumption A-KA, 
A-^, and A-H hold. Let 1C C intT*^ be compaet. Then 

2 : 7 ,e(a) < liminf Xy,,/(«')• (5.10) 

when the convergenee is within 1C x [ 7 , 00 ] x [0,e]. 

Proof. Consequence of (5.6b) of Lemma (5.2) and the weak* lower semicontinuity of F. □ 


The next lemma will be used to get partial strong convergence of minimisers as we approach dVa- 
This will then be used to derive simplified conditions for this not happening. This result is the counter¬ 
part of Lemma 5.2 that studied convergence of reconstructions away from the boundary, and depends 
on the additional Assumption A-5. This is the only place where we use the assumption, and replacing 
this lemma by one with different assumptions would allow us to remove Assumption A-5. 

Lemma 5.4 (Convergence of reconstructions at the boundary). Suppose Assumption A-KA, A-^, and 
A-d hold, and that is strongly convex. Suppose {(a^, 7 ^, G infP^ x (0,oo] x [0,e] satisfies 

^ a ^ dV^. If e = 0 or Assumption A-H holds and e > 0 is small enough, then —>■ / 

strongly in Y. 


Proof. We denote for short u^ := and note that / is unique by the strong convexity of 

Since a G dV^, there exist an index I £ {1,... ,N} such that —>■ 0. We let I be the first such index, 

and pick arbitrary (5 > 0. We take as given by Assumption A-(i, observing that the construction 
still holds with Huberisation, that is, for any 7 G (0, 00 ] and in particular any 7 = 7 ^, we have 

^{Kfs,i)<5 + ^{f), 

\\^j5A\id < \\^J6A\t < and, 

= 0 . 


If we are aiming for e > 0, let us also pick fs/ by application of Assumption A-H io u = fs/. Otherwise, 
with e = 0, let us just set fs/ = fs/. Since 

tt GargmmJ' ’ {u]a ), 
uex 


we have 


H H {fs,e-,aA < H {fs,e;aA + S 


= e^H{hA + ^KfsA + BA\Afsx .«") + <5 
< e’^HUsA + Hf) + aA\AjsA\i + 2 - 5 . 

Observe that it is no problem if some index = 00 , because by definition as a minimiser achieves 
smaller value than above, and for the latter WAjf^AA^j = 0. Choosing e > 0 small enough, it 
follows for e G [0, e] that 

0 < ^KuA - Hf) <26 + alWAjsAA 


Choosing k large enough, we thus see that 


0 < ^{KuA - $(/) < (2 + ai)5. 


Letting <5 0, we see that ^{KuA ^{f). Lemma 4.4 with C = Y therefore shows that Ku^ —)• / 

strongly in T. □ 


5.3. Minimality and co-coercivity 


Our remaining task is to show the existence of JC for (5.1), and of a uniform 1C - see (5.15) below " for 
the application of Lemma 5.3. When the fidelity and cost functionals satisfy some additional conditions, 
we will now reduce this to the existence of 5 e int Va satisfying F{u~) < F{f) for a specific / e K~^f. 
So far, we have made no reference to the data, the ground-truth /o or the corrupted measurement data 
/. We now assume this in an abstract way, and need a type of source condition, called minimality, 
relating the ground truth /o to the noisy data /. We will get back to how this is obtained later. 

Definition 5.1. Let p > 0. We say that u £ X is {K,p)-minimal if there exists C > 0 and £ Y* 
such that 

F{u) - F{v) > {py\K{u - v)) - —\\K{u - i;)||^. 

p 

Remark 5.1. If we can take C = 0, then the final condition just says that K*p^ £ dF{v). This is 
a rather strong property. Also, instead of t i—)• we could in the following proofs use any strictly 

increasing energy ■0 : [0, oo) —)■ [0,oo), V'(O) = 0. 


To deal with the smoothing term eH with e > 0, we also need co-coercivity; for the justification of 
the term for the condition in (5.11) below, more often seen in the context of monotone operators, we 
refer to the equivalences in [4, Theorem 18.15]. 

Definition 5.2. We say that F is {K,p)-co-coercive at {u*,X*) £ A x A*, A* £ dF{u*), if 

F{u)-F{u*) <{X*\u-u*) + -\\K{u-u*)\\l, (m£A). (5.11) 

p 

If F is ( A, p)-co-coercive at (u, A) for every tt £ A and A £ dF{u), we say that F is (A,p)-co-coercive. 
If p = 2, we say that F is simply A-co-coercive. 

Remark 5.2. In essence, A-co-coercivity requires F = Fqo K and usual (/-)co-coercivity of Fq. 
Lemma 5.5. Suppose h £ A is {K,p)-minimal. If C A satisfies Ku^ —)■ Kv in Y, then 

A(i;) < liminf A(tt*^). (5-12) 

k^oo 

If, moreover, F is {K,p)-eo-eoereive at {v,K*pfij, then 

F{v) = lim F{u^). (5.13) 

fc—)-CO 

Proof. For (5.12), we use the ( A,p)-minimality of v to obtain 

A(n^) - F{v) > {p^\Ku’^ -v)- -|| An^ - uj]?., {k = 1,2,3,...). 

p 

Taking the limit, it follows that 

liminf A(u^) > F{v). 

k^oo 

If we additionally have the ( A,p)-co-coercivity at {v,K*p^), then, likewise 

A(n^) - F{v) < {p^\Ku^ -v) + ^|| An^ - vfy, {k = 1,2,3,...). 

limsupA(u^) < F{v). 

k^oo 


From this we immediately get 


□ 


Proposition 5.2 (One-point conditions under co-coercivity). Suppose Assumption A-KA, A-^ and 
A-5 hold, and that is strongly convex. If f is {K,q)-minimal and 


some aeintPo, and G argmin J(u; a) satisfy 

u&X 


F{u~) < F{f), and 
Ua is {K, p)-minimal, 


(5.14) 


then there exist 7 , e > 0 such that the following hold. 

(i) For each e G [0,e] and 7 G [ 7 , 00 ] there exists a compact set 1C C intP^ such that (5.1) holds, 
(a) If, moreover. Assumption A-H holds and F is {K,p)-co-coercive for any p > 0, then there exist 
a compact set JC C int such that 


inf F{Ua,y,e) > inf F{Ua,y,e), 
a&T^\K: a&v^ 


(7 G [ 7 ,oo],e G [0,e]). 


(5.15) 


In both cases, the existence of K, says that every solution a to (P'’'’^) satisfies a € 1C. 


Proof. We note that / is unique by the strong convexity of 4>. Let us first prove (i). In fact, let us pick 
7 , e > 0 and assume with 5 fixed that 

u~ € argmin J^’'^(u;5) satisfy F{u~ ) < F{f), (7 G [ 7 ,oo],e G [0,e]). (5.16) 

u&X 

We want to show the existence of a compact set 1C C such that solutions a to (P'^’'^) satisfy a € 1C 
whenever ( 7 , e) G [ 7 , 00 ] x [0, e] for 7 G [ 7 , 00 ) and e G (0, e] to be determined during the course of the 
proof. We thus let (a^, 7 ^,e^) G x [ 7 , 00 ] x [0,e]. Since this set is compact, we may assume that 
^ a £ and e^ -£■ e, and 7 *^ —)■ 7 . Suppose a G dV^. By Lemma 5.4 then Ku^ —)■ / strongly 
in Y for small enough e, with no conditions on 7 . Further by the {K, g')-minimality of / and Lemma 
5.5 then 

F{f) <\im.miF{Ku^). (5.17) 

k^oo 


If we fix 7 *^ := 7 and e^, and pick is a minimising sequence for (P"’'’’^), we find that (5.17) is in 

contradiction to (5.14). Necessarily then a G intP^. By the lower semicontinuity result of Proposition 
5.1, d therefore has to solve (P'^’^). We have proved (i), because, if 1C did not exist, we could choose 
OL G &Pq,. 

If 7 ^ —?■ 7 , —)■ e, and solves (P'^’^) for ( 7 , e) = ( 7 ^, 6 ^), then G intP“ by (i). Now (5.17) is 

in contradiction to (5.16). Therefore (ii) holds if (5.16) holds. 

It remains to verify (5.16) for e > 0 small enough and 7 > 0 large enough. By Lemma 5.2, we may 
find a sequence \ 0 and 7 *^ 00 such that Ku~..^k -£■ Ku~ for some u~ G argmin J( •; 5). Since 

is strictly convex, and both u~,u~ G argmin^g^ j(u]a), we find that Ku~ = Ku~. Recalling the 
(iP,p)-minimality and -co-coercivity at {u~,K*ipu~), Lemma 5.5 and (5.14) now yield 

a 


limsupF(u~^^^fc) = F{u~) < F{f). 

k^oo ’ ’ 

Since we may repeat the above arguments on arbitrary sequences ( 7 ^, e^) —)■ ( 00 , 0), we conclude that 
(5.16) holds for small enough e > 0 and large enough 7 > 0. □ 

We now show the P-convergence of the cost map, and as a consequence the outer semicontinuity of 
the solution map. For an introduction to F-convergence, we refer to [9, 21]. 


Proposition 5.3 (F-convergence of the cost map and continuity of the solution map). Suppose As¬ 
sumption A-KA, Assumption A-^, and Assumption A-H hold along with (5.15). Suppose, moreover, 
that F is {K,p)-eo-eoereive, and every solution Ua,'y,e to (D'^’'^) is {K,p)-minimal with a € 1C and 
( 7 ,e) G [ 7 , 00 ] X [0,e]. Then 

^ (5.18) 

when ( 7 ', e'), ( 7 , e) G [ 7 , 00 ] x [0,e] and 1C is as in (5.15). Moreover, the solution map 

S(j,e) = argmin X.y_e(a) 
is outer semicontinuous within [ 7 , 00 ] x [ 0 ,e]. 

Proof. Lemma 5.3 shows the F-lower limit (5.10). We still have to show the L-upper limit. This means 
that given a € 1C and ( 7 *^, e^) —)■ ( 7 , e) within [0, e] x [ 7 , 00 ], we have to show the existence of a sequence 
C 1C such that 

^ 7 ,e(d) > limsupZ^fc ,,fe(a^). 
k—^oo 


We claim that we can take = a. With := Lemma 5.2 gives a subsequence satisfying 

Ku^ — 7 - Ku strongly with u a minimiser of •; a). We just have to show that 

F{u) = lim F{u^). (5.19) 

k—^OQ 

Since F is (iL,p)-co-coercive, and u by assumption (iL,p)-minimal, this follows from Lemma 5.5. 

We have therefore established the F-convergence of to Xy,e|/C as ( 7 ^ e') —)■ ( 7 , e) within 

[ 7 , 00 ] X [0,e]. Our assumption (5.15) says that the family \ (7^ ^0 G [TjOo] x [0,e]} is equi- 

mildly coercive in the sense of [9]. Therefore, by the properties of F-convergence, see [9, Theorem 1.12], 
the solution map is outer semicontinuous. □ 

5.4. The L^-squared fidelity with (iF, 2)-co-coercive cost 

In what follows, we seek to prove (5.14) for the L^-squared fidelity with (iF, 2)-co-coercive cost func¬ 
tionals by imposing more natural conditions derived from (5.20) in the next lemma. 

Lemma 5.6 (Natural conditions for L^-squared 2-co-coercive case). Suppose Assumption A-KA and 
A-5 hold. Let Y he a Hilbert spaee, f G TZ{K), and 

= ^ll/-^'lly- 

Then (5.14) holds if f is {K, 2)-minimal, F is {K,2)-eo-eoereive at {f,K*ipj) with K*{pj G dF{f ), 
and 

T{f; a) > T{f - tipf a) (5.20) 

for some a G intPa and t G (0,1/C], where C is the eo-eoereivity eonstant. 


Here we recall the definition of T( •; a) from (2.7). 

Proof. Let a G intP^. We have from the co-coercivity (5.11) that 

F{f) - F{Ua) > -(iFVjln„ - /) - ^lliFn„ - ffy. 
Using the definition of the subdifferential. 


F{u)-F{f)>{K*ipf\u-f), {uGX). 


Summing, therefore 


F{u)-F{ua)>{K*ipf\u-Ua)-^\\Kua-ffY, {uGX). (5.21) 

Setting u = f, we deduce 

F{f) - F{u^) > {^j\f - Kua) - ^\\KU^ - ffv (5.22) 

Let a = ta for some f > 0. Since <hoiL is continuous with dom(<hoiL) = X, the optimality conditions 
for Ua solving (D'>'’’^) state [29, Proposition 1.5.6] 

0 G K*{Kuta - /) + tA*[dR{ •; a)]{Auta). (5.23) 

Because uta solves (D'>'’'^), we have R{Auta',a) = T{Kuta', (^)- By Lemma 5.7 below, therefore 

0GK*{Kuta-f) + t^Pt, ^Pt^[dT{K--,a)]{uta). (5.24) 

Multiplying by (K^)* we deduce / — Kuta = t{K^)*'ipt^ so that referring back to (5.22), and using the 
definition of the subdifferential, we get for any t > 0 the estimate 

Fif) - F{uta) > (fiLV/IV’t) - ^\\Kuta - fWy 

= {uta - (/- + if-Utal'tpt) - ^\\Kuta “ /||y 

>T{Kuta;a) -T{K{f-tK'^ipj^);a) + {f-uta\ipt) - ^\\Kuta - fWy. 

Since uta solves (D'>'’^) for a = ta, using (5.24), we have 

(/ - wtalV’t) = \\\Kut^ - /lly > 2 (r(/; a) - T{Kuta; «)) • 

It follows 

F{f) - F{uta) > T{Kf-,a) - r(iL(/- ti^t^^-); a) + *--^^\\Kuta - /||^ (5.25) 

We see that (5.20) implies (5.14) if 0 < t < C~^. □ 

Lemma 5.7. Suppose G [^-R( •; a)]{Au) with A*^!) G 1Z{K*), and that R{Au; a) = T{Ku] a). Then 
A*il) G [dT{K--a)]{u) 

Proof. Let A G be such that K*X = A*ip. By the definition of the subdifferential, we have 

R{Au''] a) — R{Au; a) > {\,K{u' — u)), {u” G X). 

Minimising over u" G X with Ku" = Ku' for some u' G X, and using R{Au]di) = T{Ku',a), we 
deduce 

T{Ku'; a) - T{Ku; a) > (A, K{u' - u)), {u' G X). 

Thus 

T{Ku'- a) - T{Ku; a) > {A*iP,u' - u), {u' G X). 

This proves the claim. □ 


Summarising the developments so far, we may state: 



Proposition 5.4. Suppose Assumption A-KA hold A-5. Let Y he a Hilbert spaee, f €YriTZ{K), and 

If f is {K, 2)-minimal, F is {K,2)-eo-eoereive at {f,K*(pj), and 

T{f-a)>T{f-t^f,a) (5.26) 

for some a e intPo and t G (0,1/C], then the elaims of Proposition 5.2 hold. 


Proof. It is easily checked that Assumption A-4> holds. Lemma 5.6 then verifies the remaining con¬ 
ditions of Proposition 5.2, which shows the existence of /C in both cases. Finally, Proposition 5.1 
shows the existence of d G intP“ solving (P"^’^). For the continuity of the solution map, we refer to 
Proposition 5.3. □ 


5.5. L^-squared fidelity with L^-squared cost 


We may finally finish the proof of our main result on the fidelity <h(u) := |||/ —ujjy, P = 
with the L^-squared cost functional F{u) = ^\\Kou — /o|||. 


Proof of Theorem 3.1. We have to verify the conditions of Proposition 5.4, primarily the (A, 2)- 
minimality of /, the A-cocoercivity of A, and (5.26). Regarding minimality and co-coercivity, we 
write F = Fqo Aq, where Ao(^^) = |||^ ~ /o|lz- Then for any v,v' G Z, we have 

Ao(?^') - To(u) = (v'-u,u-/o)-h^||u'-u||^2(o)- 

From this (/, 2)-co-coercivity of Fq with C = 1 is clear, as is the (I, 2)-minimality with regard to Aq of 
every v £ Y. By extension, A is easily seen to be (Aq, 2)-co-coercive, and every u £ X (Aq, 2)-minimal. 
Using (3.1), (A, 2)-co-coercivity of A with C = Cq is immediate, as is the (A, 2)-minimality of every 
u£X. 


Regarding (5.26), we need to find ipj such that K*ipj = VF{f). We have 

K*o{Kof-fo)=VF{f). 

From this we observe that (pj exists, because (3.1) implies M{K) C M{Kq), and hence TZ{K*) C 
TZ{Kq). Here Af and TZ stand for the nullspace and range, respectively. Setting K*ipj = Aq(Ao/ — /o) 
and using AA^ = I on TZ{K), we thus find that 

(/pj=(AoAt)*(Ao/-/o). 

Observe that since M{K) C W(Ao), this expression does not depend on the choice of / G K~^f. 
Following Remark 2.2, we can replace / = A^/. It follows that (3.3) implies (5.26). □ 


Remark 5.3. Provided that satisfies Assumption A-<1>, A-5, and A-H, it is easy to extend Lemma 
5.6 and consequently Theorem 3.1 to the case 

= ^\\v\\‘^-{f\v)Y*,Y, {v£Y), 


where Y D T is a Hilbert space, f £Y*, and Y still a reflexive Banach space. As T C T = Y*<zY*, 
in this case, we still have 

V^{v) = v- f £Y*. 


In particular 


V$(Au) = K*{Ku - f)£X*. 


Therefore the expression (5.23) still holds, which is the only place where we needed the specific form 
of 4>. 


Example 5.1 (Bingham flow). As a particular case of this remark, we take Y = Y = Then 

Y* = With / £ the Riesz representation theorem allows us to write 

J^fvdx = 

for some / £ which we may identify with /. Therefore, Theorem 3.1 can be extended to cover 

the Bingham flow of Example 2.7. In particular, we get the same condition for interior solutions as in 
Corollary 3.1, namely 

TV(/) > TV(/o). 


5.6. A more general technique for the L^-squared fidelity 

We now study another technique that does not require {K, 2)-minimality and {K, 2)-co-coercivity at 
/. We still however require <1> to be the L^-squared fidelity, and to be (A,p)-minimal. 

Lemma 5.8 (Natural conditions for the general L^-squared case). Suppose Assumption A-KA and 
A-5 hold. Let Y a Hilbert spaee, / £ T n 1Z{K), and 

= ^ll/-^'lly- 

The claims of Proposition 5.2 hold if for some a £ miVa and t > 0 both f and the solution Ua are 
{K.,p)-minimal and 

T{Kua;a) > T{f-tipuf,;a). (5-27) 

Remark 5.4. Setting a = sa \n (5.27), we see employing the lower semicontinuity Lemma 4.2 that 
the former is implied by 

r(/;d) >limsupr(/-t¥5^5;d). (5.28) 

s\0 

Here we use the shorthand := The difficulty is going to the limit, because we do not 

generally have any reasonable form of convergence of {7’ss}s>o- If we did indeed have —)■ pj, then 

(5.28) and consequently (5.32) would be implied by the condition (5.26) we derived using (A, 2)-co- 
coercivity. We will in the next subsection go to the limit with finite-dimensional functionals that are 
not (A, 2)-co-coercive and hence the earlier theory does not apply. 


Proof. Let us observe that (5.14) holds if for some a > 0 and C > 0, we can And a (A,p)-minimal 


Ua £ argmin J(m; a), 

u&X 


satisfying 


{pa\f - KUa) > 0 . 


(5.29) 


Here we denote for short ipa '■= recalling that K*ipa £ dF{ua). Indeed, by the definition of the 
subdifferential, the minimality of and (5.29), we deduce 


F{f) > F{Ua) + {Pa\f - KUa) > F{Ua). 


(5.30) 


This shows (5.14). 

We need to show that that (5.27) implies (5.29). As in the proof of Lemma 5.6, we deduce by 
application of Lemma 5.7 that 

0 € K* {Kua — f) + Ipa, for some fja ^ [dT{K • ; a)]{ua). (5.31) 

if-Uali’a) = {f-Ua\K*{f - KUa)) = \\f - KUafy > 0 - 


Then 


Multiplying (5.31) by (K^)* and using this estimate, we deduce for any t > 0 that 

- KUa) > - {Ua - ti^Va)|V’a) 

= t~^{Ua - (/- tit'Va)IV'a) + *“^7 “ 

> - (/- tit'Va)|V'a) 

> r^d(r(Kus;d) -r(K(/-ti^Va);«)) ■ 

The last step follows from the definition of dT{K •; d). This proves that (5.27) implies (5.29). □ 

Summing up the developments so far, we may in contrast to Proposition 5.4 that depended on / 
and co-coercivity, state: 

Proposition 5.5. Suppose Assumption A-KA and A-5 hold. Let Y he a Hilbert spaee, f €YriTZ{K), 
and 

If for some a € intVa, t > 0, the solution is {K,p)-minimal with 

T{Kug,;a) > T{f-tipu=;a), (5.32) 

then there exist 7 > 0 and e > 0 sueh that the following hold. 

(i) For eaeh e G [0,e] and 7 G [ 7 , 00 ] there exists a compaet set IC C intP^ sueh that (5.1) holds. 

In partieular there exists a solution a G intP^ to (P'>'’^). 

(a) If, moreover, Assumption A-H holds, there exists a eompaet set K, C intP^ sueh that (5.15) 
holds and the solution map S is outer semieontinuous within [ 7 , 00 ] x [0,e]. 

Proof. It is easily checked that Assumption A-<i> holds. Lemma 5.8 then verifies the remaining con¬ 
ditions of Proposition 5.2, which shows the existence of /C in both cases. Finally, Proposition 5.1 
shows the existence of d G intP“ solving (P'^’^). For the continuity of the solution map, we refer to 
Proposition 5.3. □ 

5.7. L^-squared fidelity with Huberised L^-type cost 

We now study the Huberised total variation cost functional. We cannot in general prove that solutions 
Ua for small a are better than /. Consider, for example /o a step function, and / a noisy version 
without the edge destroyed. The solution Ua might smooth out the edge, and then we might have 
\\Dua - Df\\M « \\Dua\\M + \\Df\\M > \\Dfo - Df\\M ~ 0. This destroys all hope of verifying the 
conditions of Lemma 5.8 in the general case. If we however modify the set of test functions in the 
definition of L^V to be discrete we can prove this bound. Alternatively, we could assume uniformly 
bounded divergence from the family of test functions. We have left this case out for simplicity, and 
prove our results for general costs with finite-dimensional Z. 

Lemma 5.9 (Conditions for cost). Suppose Z is a reflexive Banaeh spaee, and Kq : X ^ Z is 
linear and bounded, and satisfies (3.1). Then, whenever (5.26) holds, F{u) := Fii{Kqu) is {K,l)-eo- 
eoereive and (5.32) holds for some a G miVa with both and f being (K, 1)-minimal. 

Proof. Denote 

B:={\ez*\ ||A||z. <1}. 


We first verify (/, l)-co-coercivity of Fii. Let v,v* G Z, and A* G S be such that A* G dFii{v*). 
Clearly A* achieves the maximum for F]^i{v*). Let X € B achieve the maximum for F]^i{v). Then 

FlI,{v) - Fli{v*) < (Al^; - /o) - (A|^* - /o) 

= (A*|u-u*) + (A-A*|?;-r;*) 

< (A*|u — V*) + 2 sup ||A'||||u — u*|| (5.33) 

X'eB 

< (A*|u-u*) + 2||u-u*||. 

This proves (/, l)-co-coercivity of F^i. (K, l)-co-coercivity of F now follows similarly to the argument 
in the proof of Theorem 3.1, using (3.1). 

Analogously, taking the traingle inequality in (5.33) in the opposite direction, we show that every 
u € X is (K, l)-minimal. Therefore, in particular both / and are {K, l)-minimal 

To verify (5.32), it is enough to verify (5.28). Similarly to the proof of Theorem 3.1, using (3.1), we 
verify that K*^psa G dF{usa) exists, and 

G {K^TdF{Uscd = {K^K^yXsc,, 

where Xsa G B achieves the maximum for Fo^KoUga — /o)- In fact, (fsa G TZ{K). If this would not 
hold, we could find v T ^{K) such that 

0 < {v\ipsa) = {KoK^lXs^) < \\KoKK\\\\Xsy\ < C||iLiLtu||||A,s||. 

But, for any v' G TZ{K), 

{v'\KK^v) = {{KKy*v'\v) = {v'\v) = 0. 

Therefore ||iLiLlu|| = 0, and we reach a contradiction unless A^a = 0, that is tpsa = 0 G 1Z{K). 


As is easily verified, dFj^i is outer semicontinuous with respect to strong convergence in the domain Z 
and weak* convergence in the codomain Z*. That is, given ^ v and ^ z with z^ G dF]^i{u^), we 
have z G dFi^i{v). By Lemma 5.4 and (3.1), we have Kqu^ —)• KqJ strongly in Z. Since B is bounded, 
we may therefore find a sequence \ 0 with A^fe^ ^ Aq G dF{f) weakly* in Z*. Since by assumption 
Kq is compact, then also Kq is compact [40, Theorem 4.19]. Consequently (Pgk^ —)■ (^o := (A'oA'I)*Ao 
strongly in Y after possibly moving to an unrelabelled subsequence. Let us now consider the right 
hand side of (5.32) for a = s^a. Since / = AT/, and we have proved that k- ^ 'F{K), Lemma 4.2 
shows that 

=T{f -tcpo; a). 

Minding the discussion surrounding (5.28), we observe that choosing a = s^a for large enough k > 0, 
(5.28) is implied by (5.26). □ 


Proof of Theorem 3.2. From the proof of Lemma 5.9, we observe that (5.26), can be expanded as 

T{f;a)>T{f-t{KoKy*Xo;a), 

where Aq G F with Aq G dFii(KQf). As in the proof of Theorem 3.1, this is in fact independent of 
the choice of /, so may replace / = K^f. Thus Aq G dFii{KoK^f). By Lemma 5.9, the conditions 
of Proposition 5.5 are satisfied, so we may apply it together with Proposition 5.1 to conclude the 
proof. □ 

Remark 5.5. The considerations of Remark 5.3 also apply to Lemma 5.8 and consequently Theorem 
3.2. That is, the results hold for the cost 

= ^\\v\\‘^-{f\v)Y*,Y, (veY), (5.34) 

where Y D T is a Hilbert space, f € Y*, and Y a reflexive Banach space. Indeed, again the specific 
form of was only used for the optimality condition (5.31), which is also satisfied by the form (5.34). 



(a) Original image (b) Noisy image 


Figure 1: Parrot test image 

6. Numerical verification and insight 

In order to verify the above theoretical results, and to gain further insight into the cost map we 
computed the values for a grid of values of a, for both TV and TGV^ denoising, and and L^V 
cost functionals. This we did for two different images, the parrot image depicted in Figure 1 and the 
Scottish southern uplands image depicted in Figure 2. The results are visualised in Figure 3 and Figure 
4, respectively. For TV, the parameter range was 

a€U := {0.001,0.01,0.02,... 0.5}/n 

(altogether 51 values), where n = 256 is the edge length of the rectangular test image. For TGV^ the 
parameter range was a G U x (U/n). We set 7 = 100, e = 1 e“^°, and computed the denoised image 
Ua,'y,e by the SSN denoising algorithm that we report separately in [24] with more extensive numerical 
comparisons and further applications. 

As we can see, the optimal a clearly seems to lie away from the boundary of the parameter domain 
■Pa, confirming the theoretical studies for the squared cost L 2 , and the discrete version of the 
Huberised TV cost T^V. The question remains: do these results hold for the full Huberised TV? 

We further observe from the numerical landscapes that the cost map is roughtly quasiconvex 
in the variable a for both TV and TGV^. In the /3 variable of TGV^ the same does not seem to hold, 
as around the optimal solutoin the level sets tend to expand along a as /3 increases, until starting to 
reach their limit along (3. However, the level sets around the optimal solution also tend to be very 
elongated on the j3 axes. This suggests that TGV^ is reasonably robust with respect to choice of /3, as 
long as it is in the right range. 
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Figure 2: Uplands test image 


A data statement for the EPSRC 

This is a theoretical mathematics paper, and any data used merely serves as a demonstration of 
mathematically proven results. Moreover, photographs that are for all intents and purposes statistically 
comparable to the ones used for the final experiments, can easily be produced with a digital camera, 
or downloaded from the internet. This will provide a better evaluation of the results than the use of 
exactly the same data as we used. 
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